Introduction

Good day Mr. CEO & CFO. I’d like to thank you for accepting my invitation. Today, I’d like to address the questions you recently posed during our last Steering Committee meeting and provide you with a brief analysis that may pique your interest. This research was done using the Beer and Breweries data provided.

# Install all needed R libraries and packages.

library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v ggplot2 3.3.3     v purrr   0.3.4
## v tibble  3.1.1     v dplyr   1.0.5
## v tidyr   1.1.3     v stringr 1.4.0
## v readr   1.4.0     v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(ggthemes)
library(ggplot2)
library(plotly)
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
library(GGally)
## Registered S3 method overwritten by 'GGally':
##   method from   
##   +.gg   ggplot2
library(caret)
## Loading required package: lattice
## 
## Attaching package: 'caret'
## The following object is masked from 'package:purrr':
## 
##     lift
library(maps)
## 
## Attaching package: 'maps'
## The following object is masked from 'package:purrr':
## 
##     map
library(mapproj)
library(dplyr)
#library(stringi)
#library(stringr)

#Cookbook

#Load beers data
beers = read.csv(file.choose())

#Load breweries data
breweries = read.csv(file.choose())


#Quick look at the dataframe for both datasets.
head(beers)
##                  Name Beer_ID   ABV IBU Brewery_id
## 1            Pub Beer    1436 0.050  NA        409
## 2         Devil's Cup    2265 0.066  NA        178
## 3 Rise of the Phoenix    2264 0.071  NA        178
## 4            Sinister    2263 0.090  NA        178
## 5       Sex and Candy    2262 0.075  NA        178
## 6        Black Exodus    2261 0.077  NA        178
##                            Style Ounces
## 1            American Pale Lager     12
## 2        American Pale Ale (APA)     12
## 3                   American IPA     12
## 4 American Double / Imperial IPA     12
## 5                   American IPA     12
## 6                  Oatmeal Stout     12
head(breweries)
##   Brew_ID                      Name          City State
## 1       1        NorthGate Brewing    Minneapolis    MN
## 2       2 Against the Grain Brewery    Louisville    KY
## 3       3  Jack's Abby Craft Lagers    Framingham    MA
## 4       4 Mike Hess Brewing Company     San Diego    CA
## 5       5   Fort Point Beer Company San Francisco    CA
## 6       6     COAST Brewing Company    Charleston    SC

How many breweries are present in each state?

#Use the breweries dataset to count the number of breweries present in each State.
#This data will be stored in the variable "breweries_Count_State."
breweries_Count_State = dplyr::count(breweries,State)


#Rename the column n to a recognizable name.
colnames(breweries_Count_State)[2] ="Count"

####Changed count to 2 to make it more legible
#Here I'm using the totals variable to display the number of breweries per State.
breweries_Count_State %>% ggplot(mapping = aes(x = State, y = Count)) +
  geom_bar(stat = "identity", color = "darkgrey", fill = "darkslategray4" )+
  xlab("States")  + ylab("Count of Breweries")+
  geom_text(aes(State, Count+1, label = Count), data = breweries_Count_State)+
  ggtitle("Breweries per State")+
  theme_tufte()

Merge beer data with the breweries data. Print the first 6 observations and the last six observations to check the merged file. (RMD only, this does not need to be included in the presentation or the deck.)

#Use the Inner join method to merge the beers and breweries dataset via the Brew_ID and Brewery_ID
beers_Breweries = merge(beers, breweries, by.x = "Brewery_id", by.y = "Brew_ID")

head(beers_Breweries)
##   Brewery_id        Name.x Beer_ID   ABV IBU
## 1          1  Get Together    2692 0.045  50
## 2          1 Maggie's Leap    2691 0.049  26
## 3          1    Wall's End    2690 0.048  19
## 4          1       Pumpion    2689 0.060  38
## 5          1    Stronghold    2688 0.060  25
## 6          1   Parapet ESB    2687 0.056  47
##                                 Style Ounces             Name.y        City
## 1                        American IPA     16 NorthGate Brewing  Minneapolis
## 2                  Milk / Sweet Stout     16 NorthGate Brewing  Minneapolis
## 3                   English Brown Ale     16 NorthGate Brewing  Minneapolis
## 4                         Pumpkin Ale     16 NorthGate Brewing  Minneapolis
## 5                     American Porter     16 NorthGate Brewing  Minneapolis
## 6 Extra Special / Strong Bitter (ESB)     16 NorthGate Brewing  Minneapolis
##   State
## 1    MN
## 2    MN
## 3    MN
## 4    MN
## 5    MN
## 6    MN
tail(beers_Breweries)
##      Brewery_id                    Name.x Beer_ID   ABV IBU
## 2405        556             Pilsner Ukiah      98 0.055  NA
## 2406        557  Heinnieweisse Weissebier      52 0.049  NA
## 2407        557           Snapperhead IPA      51 0.068  NA
## 2408        557         Moo Thunder Stout      50 0.049  NA
## 2409        557         Porkslap Pale Ale      49 0.043  NA
## 2410        558 Urban Wilderness Pale Ale      30 0.049  NA
##                        Style Ounces                        Name.y          City
## 2405         German Pilsener     12         Ukiah Brewing Company         Ukiah
## 2406              Hefeweizen     12       Butternuts Beer and Ale Garrattsville
## 2407            American IPA     12       Butternuts Beer and Ale Garrattsville
## 2408      Milk / Sweet Stout     12       Butternuts Beer and Ale Garrattsville
## 2409 American Pale Ale (APA)     12       Butternuts Beer and Ale Garrattsville
## 2410        English Pale Ale     12 Sleeping Lady Brewing Company     Anchorage
##      State
## 2405    CA
## 2406    NY
## 2407    NY
## 2408    NY
## 2409    NY
## 2410    AK
#Rename the columns Name.x and Name.y to be more meaningful names.
colnames(beers_Breweries)[2] = "Beer_Name"
colnames(beers_Breweries)[8] = "Brewery_Name"

#The following columns had NA values/missing data: ABV, IBU, and Styles(these were blank and Not N/A). To Tidy the data, we excluded those values from the dataset. We do not have a way to find those data points at this time. 
beersNbreweries = na.omit(beers_Breweries)

Compute the median alcohol content and international bitterness unit for each state. Plot a bar chart to compare.

#Compute the median the ABV (alcohol content) for each state.
ABV_Median_State = beersNbreweries %>% group_by(State) %>% dplyr::summarize(ABV_Median = median(ABV), count = n())

#Plot a bar chart
ABV_Median_State %>% ggplot(mapping = aes(x= State, y = ABV_Median)) +
  geom_bar(stat = "identity",color = "darkgrey", fill = "darkslategray4")+
  geom_text(aes(State, ABV_Median+0.002, label = ABV_Median), data = ABV_Median_State, size=2.3)+ 
  theme_tufte()+
  ggtitle("Median ABV per State") + xlab("STATES") + ylab ("ABV Mean")

#Compute the median IBU (International bitterness Unit)
IBU_Median_State = beersNbreweries %>% group_by(State) %>% dplyr::summarize(IBU_Median = median(IBU), count= n())


#Plot a bar chart
  IBU_Median_State %>% ggplot(mapping = aes(x= State, y = IBU_Median )) +
  geom_bar(stat = "identity",color = "darkgrey", fill = "darkslategray4")+ 
  geom_text(aes(State, IBU_Median+1, label = IBU_Median), data = IBU_Median_State, size=4)+
  theme_tufte()+
  ggtitle("Median IBU per State") + xlab("STATES") + ylab ("IBU Mean")

# Which state has the maximum alcoholic (ABV) beer? Which state has the most bitter (IBU) beer?

# Finds the Max number within the ABV variable and the row number (index) in which the maximum ABV number may be found.
 
state_Max_ABV = beersNbreweries %>% summarise(Max_ABV= max(ABV), row_Index = which.max(ABV))
state_Max_ABV # The maximum ABV is 0.125 and can be found in row 8.
##   Max_ABV row_Index
## 1   0.125         8
#Uses the row number (8) from above to find the State that coincides with the maximum ABV. 
beersNbreweries$State[8]
## [1] " KY"
#The State is Kentucky (KY).

# Find the Max number within the IBU variable and the row number (index) in which the maximum IBU number may be found.

state_Max_IBU = beersNbreweries %>% summarise(Max_IBU= max(IBU), row_Index = which.max(IBU))
state_Max_IBU # The maximum IBU is 138 and can be found in row 1134.
##   Max_IBU row_Index
## 1     138      1134
#Uses the row number (1134) from above to find the State that coincides with the maximum IBU.
beersNbreweries$State[1134]
## [1] " OR"
#The State is Oregon (OR).

Comment on the summary statistics and distribution of the ABV variable.

#Statistics of the ABV variable
beersNbreweries %>%
  group_by(State) %>%
  dplyr::summarize(mean = mean(ABV), sd = sd(ABV), median = median(ABV), range = max(ABV)- min(ABV), IQR = IQR(ABV), count = n())
## # A tibble: 50 x 7
##    State   mean       sd median range     IQR count
##    <chr>  <dbl>    <dbl>  <dbl> <dbl>   <dbl> <int>
##  1 " AK" 0.0562  0.00525 0.057  0.017 0.008      17
##  2 " AL" 0.0622  0.0132  0.06   0.043 0.00400     9
##  3 " AR" 0.04   NA       0.04   0     0           1
##  4 " AZ" 0.0610  0.0126  0.0575 0.053 0.0163     24
##  5 " CA" 0.0629  0.0150  0.058  0.062 0.0225    135
##  6 " CO" 0.0648  0.0144  0.065  0.057 0.0177    146
##  7 " CT" 0.0638  0.0219  0.061  0.054 0.0323      6
##  8 " DC" 0.065   0.0191  0.059  0.042 0.0195      4
##  9 " DE" 0.055  NA       0.055  0     0           1
## 10 " FL" 0.0628  0.0102  0.062  0.037 0.017      37
## # ... with 40 more rows
#Box Plot to show the Statistics of the ABV variable.
beersNbreweries %>% 
  ggplot(mapping = aes(x = ABV))+
  geom_boxplot(color = "darkgrey", fill = "darkslategray4")+
  ggtitle("ABV Variable Box Plot")+
  theme_tufte()

# I Created a Histogram of ABV to see its distribution.
beersNbreweries %>% ggplot(mapping = aes(x = ABV))+
  geom_histogram(color = "darkgrey", fill = "darkslategray4")+
  theme_tufte()+
  ggtitle("Histogram of ABV") + xlab("ABV Values") + ylab ("Count")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

# The distribution of the ABV variable is right skewed.

Is there an apparent relationship between the bitterness of the beer and its alcoholic content? Draw a scatter plot. Make your best judgment of a relationship and EXPLAIN your answer

#Scatter Plot showing the distribution of the ABV versus the IBU variables.

g = beersNbreweries %>% ggplot(mapping = aes(x = ABV, y = IBU, position ="jitter")) +
  geom_point(color = "darkslategray4")+
  theme_tufte()+
  xlab("ABV") + ylab("IBU")+
  ggtitle("Relationship between bitterness and Alcohol content")+
  geom_smooth(aes(x = ABV, y = IBU))

ggplotly(g)
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
# The relationship between the ABV and IBU appears to be positively linear. As the ABV increases we can see that IBU also increases. There is a cluster around the 0.050 ABV value, indicating that a great majority of beers try to stay near that number.In our data set most of the ABV data was below the 0.100 ABV value. There were only two beers that went above that number.However, the IBU values was not as high as expected given that the trend is linear. These two beers were in the States of Indiana and Kentucky. 

#Budweiser would also like to investigate the difference with respect to IBU and ABV between IPAs (India Pale Ales) and other types of Ale (any beer with “Ale” in its name other than IPA). You decide to use KNN classification to investigate this relationship. Provide statistical evidence one way or the other. You can of course assume your audience is comfortable with percentages … KNN is very easy to understand conceptually.

#Due June 26th


# (grep("SUSHI", df_Baltimore_Restaurants$name))
# length(grep("SUSHI", df_Baltimore_Restaurants$name))
# sum(grepl("SUSHI", df_Baltimore_Restaurants$name))
# which(grepl("SUSHI", df_Baltimore_Restaurants$name))
# which(grepl("0.125",beersNbreweries$State))